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Abstract 

We derive sufficient conditions for a family (5", p„, P„) of metric probability spaces to have 
the measure concentration property. Specifically, if the sequence {Pn} of probability measures 
satisfies a strong mixing condition (which we call 77-mixing) and the sequence of metrics {pn} is 
what we call ^f-dominated, we show that (5", p„, P„) is a normal Levy family. We establish these 
properties for some metric probability spaces, including the possibly novel 5 = [0, 1], = ||-||-^ 
case. 

Keywords: concentration of measure, martingale differences, metric probability space. Levy 
family, strong mixing 

1 Introduction 

1.1 Background 

The study of measure concentration in general metric spaces was initiated in the 1970's by 
Vitali Milman, who in turn drew inspiration from Paul Levy's work (see |22) for a brief historical 
exposition) . Since then, various deep insights have been gained into the concentration of measure 
phenomenon (140 . 

The words "measure" and "concentration" suggest an interplay of analytic and geometric 
aspects. Indeed, there are two essential ingredients in proving a concentration result: the random 
variable must be continuous in a strong (Lipschitz) sense, and the random process must be mixing 
in some strong sense. The simple examples we give in 21 illustrate how, in general, the failure 
of either of these conditions to hold can prevent a random variable from being concentrated. 

A common way of summarizing the phenomenon is to say that in a high-dimensional space, 
almost all of the probability is concentrated around any set whose measure is at least i. Another 
way is to say that any "sufRciently continuous" function is tightly concentrated about its mean. 
To state this more formally (but still somewhat imprecisely), let (Ai)i<i<„, Xi S S, be the 
random process defined on the probability space (5",JF, P), and / : 5" ^ M be a function 
satisfying some Lipschitz condition (and possibly others, such as convexity). A concentration of 
measure result (for our purposes) is an inequality of the form 

P{|/(A)-E/(A)| >t} < ceM-Kt^) (1) 
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where c > is a small constant (typically, c = 2) and i^T > is some quantitative indicator of 
the strong mixing properties of X. It is crucial that neither c nor K depend on f} 

A few celebrated milestones that naturally fall into the paradigm of ^ include Levy's orig- 
inal isoperimetric inequality on the sphere (see the notes and references in |13p. McDiarmid's 
bounded differences inequality 18 , and Marton's generalization of ^Hl for contracting Markov 
chains ^H]- (Talagrand's no-less celebrated series of results does not easily lend itself to such 
a compact description.) 

Building on the work of Azuma jlj and Hoeffding McDiarmid showed that if / : 5" ^ R 
has ll/llup < 1 under the normalized Hamming metric dnam and P is a product measure on S" , 
we have 



(he actually proved this for the more general class of weighted Hamming metrics) . Using coupling 
and information-theoretic inequalities, Marton showed that if the conditions on / : 5" ^ R are 
as above and P is a contracting Markov measure on 5" with Doeblin coefficient 9 < I, 



where Mf is a P-median of /. Since product measures are degenerate cases of Markov measures 
(with 9 = 0), Marton's result is a powerful generalization of 10). 

Two natural directions for extending results of type Q are to derive such inequalities for 
various measures (processes) and metrics. Talagrand's paper |22| is a tour de force in proving 
concentration for various (not necessarily metric) notions of distance, but it deals exclusively with 
product measures. Since the publication of Marton's concentration inequality in 1996 (to our 
knowledge, the first of its kind for a nonproduct, non-Haar measure), several authors proceeded 
to generalize her information-theoretic approach 130], and offer alternative approaches based on 
the entropy method ^1 12] or martingale techniques . Talagrand in |22| discusses strengths 
and weaknesses of the martingale method, observing that "while in principle the martingale 
method has a wider range of applications, in many situations the [isoperimetric] inequalities [are] 
more powerful." Bearing out his first point, Kontorovich and Ramanan ^T] used martingales to 
derive a general strong mixing condition for concentration (in the dnam metric), applying it to 
weakly contracting Markov chains. Following up, Kontorovich extended the technique to hidden 
Markov and Markov tree ^01 measures. 

Although a detailed survey of measure concentration literature is not our intent here, we 
remark that many of the results mentioned above may be described as working to extend in- 
equalities of type to wider classes of measures and metrics by imposing different strong mixing 
and Lipschitz continuity conditions. Already in |15j . Marton gives a (rather stringent) mixing 
condition sufficient for concentration. Later, Marton |lf)l [T7] and Samson prove concentra- 
tion for general classes of processes in terms of various mixing coefficients; Samson applies this 
to Markov chains and 0-mixing processes while Marton's application concerns lattice random 



In this paper, we build upon the results in |11| and give general metric and mixing conditions 
that ensure the concentration of measure. We make use of a fundamental mixing coefficient, 
which has appeared (under various guises) in Marton's and Samson's work, to define the notion 
of T^-mixing for a random process. We also define a condition on the metric space, which we call ^- 
dominance. Our main result, Thcorcm l7.ll states that if the family of metric probability spaces 
(iS", p„, P)„>i is such that P is 77-mixing and (5",p„)„>i is ^'-dominated, then (5",p„,P) is 
a normal Levy family, and therefore exhibits measure concentration. We also give examples of 
metric probability spaces satisfying these conditions. 

See |14| for a much more general notion of concentration. 



P{|/-E/|>i} < 2exp{-2nt^) 



(2) 





fields. 
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1.2 Paper outline 

This paper is organized as follows. In 33 we fix some notation used throughout the paper and 
dispose of some measure-theoretic issues. We review Levy families and concentration functions, 
and their connection to deviation inequalities in 121 In 21 introduce the method of bounded 
martingale differences as our technique for proving measure concentration. We define the two 
key notions of this paper, rj-mixing and -dominance in ^ and ffHl respectively. Our main 
concentration result for t;- mixing processes with ^-dominated metrics is proved in In ^IHlwe 
give examples of some natural ^-dominated metrics, and conclude the paper with a summary 
and brief discussion in f|51 Finally, the Appendix takes a bit of a scenic detour, examining the 
two norms defined in this paper and the strength of the topologies they induce. 

2 Notation and technicalities 

Random variables are capitalized {X), specified sequences (vectors) are written in lowercase 
{x G iS"), the shorthand Xf = {Xi, . . . ,Xj) is used for all sequences, and brackets denote 
sequence concatenation: [x\ x^j^^ — x^. Often, for readability, we abbreviate [yw] as yw. 

We use the indicator variable to assign 0-1 truth values to the predicate in {•}. The sign 
function is defined by sgn(z) = l{2>o}~l{z<o}- The ramp function is defined by (z)^ — zl{2>o}- 

We will follow Talagrand's time-honored tradition of dispensing with measure-theoretic tech- 
nicalities, since the (well-understood) problems they raise would distract us from the big picture. 
Only in the Appendix do these issues become interesting and relevant, and are handled there 
with rigor. 

In any metric probability space {X , p, P), it is understood that P is a measure on the Borel 
cr-algebra generated from the topology induced by p. We will often abuse notation slightly by 
suppressing the dependence on the dimensionality n in the measures P„. In such cases, we are 
implicitly assuming that the probability measures are consistent in the sense that for each Borel 
set A c we have 

P„_l(A) = / dVr.ix'l). 
J AxS 

The probability P and expectation E operators are defined with respect the measure space 
specified in context. To any probability space (iS",JF, P), we associate the canonical random 
process X — X", Xi e S, satisfying 

P{X e A} ^ P{A) 

for any A E !F. 

If /i is a positive Borel measure on (A", JF) and r is a signed measure on (A", JF), we define the 
total variation of r by 

oo 

2||r||^^ - sup^|r(£;OI, (4) 

1=1 

where the supremum is over all the countable partitions Ei of X (this quantity is necessarily 
finite, by Theorem 6.4 of ;2D,).^ It is a consequence of the Lebesgue-Radon-Nikodym theorem 
(1201, Theorem 6.12) that ii t p, with density h, we have 

2||t||tv = / \h\d^l. 
Jx 

Note the factor of 2 in 1^, which typically does not appear in analysis texts but is standard in probability theory, 

when r is the difference of two probability measures. 
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Additionally, if t is balanced, meaning that t{X) = 0, we have 

II^IItv = / W+^m; (5) 

Jx 

this follows the Hahn decomposition Theorem 6.14). 

If {X,J-,ii) is a (positive) measure space, we write Lp{X,^) for the usual space of fi- 
measurable functions / : A" ^ M, whose Lp norm 

is finite. We will write as or just IMI^^ if there is no ambiguity; when /x is the 

counting measure on a discrete space, we write this as 

Likewise, the Loo norm, ||/||^ = ess sup |/| is defined via the essential supremum: 

ess sup /(a;) = inf{a e [—00,00] : > a} = 0}. 

x^X 

The Hamming metric on a product space 5" is the sum of the discrete metrics on S: 

n 
i=l 

for x,y £ 5". Sometimes we will work with the normalized Hamming metric: dnam = T^c^Ham- 



3 Levy families and concentration in metric spaces 

A natural language for discussing measure concentration in general metric spaces is that of Levy 
families. This definition is taken, with minor variations, from Chapter 6 of |19| . Let {X, p, P) be 
a Borel probability space whose topology is induced by the metric p. Whenever we write A C X, 
it is implicit that ^ is a Borel subset of X. For t > 0, define the t-fattening of ^ C A': 

At^{x eX : p{x,A) < t}. 

The concentration function «(•) — aAr.p.p(-) is defined by: 

a{t) = l-M{P{At):AcX,P{A)>^}. 

Let p„, P„)„>i be a family of metric probability spaces with diamp^(A'„) < 00, where 

diamp,^(A'„) = sup pn{x,y). (6) 

x,yi^X,i 

This family is called a normal Levy family if there are constants ci, C2 > such that 

ctx„,p„,p„{t) < ci exp(-C2nt^) 

for each t > and n> 1. 

The condition of being a normal Levy family implies strong concentration of a Lipschitz 
/ : Xn — > M about its median (and mean); this connection is explored in-depth in |14| . In 
particular, if {X,p,P) is a metric probability space and / : A" — )■ R is measurable, define its 
modulus of continuity by 

u;f{6) = snp{\fix)^fiy)\:pix,y)<S}. (7) 
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A number Mf G M is called a median of / if 

P{/ < Mf} > i and P{/ > Mf} > \ 
(a median need not be unique). These definitions immediately imply the deviation inequality 

ini{i-9) 

P{|/-M/|>c.^.(<5)} < 2ax,pA5), 

which in turn yields |14|f LIB) 

P{|/-M;|>i} < 2a;,,p,p(i/||/ILip), (8) 

where the Lipschitz constant ||/||Lip is the smallest constant C for which 0Jf{5) < C6, for all 
5 > 0. In particular, JSJ lets us take H/Hlip = 1 without loss of generality, which we shall do 
below. Theorem 1.8 in |14| lets us convert concentration about a median to concentration about 
any constant: 

Theorem. Let f be a measurable function on a probability space {X,A,F). Assume that for 
some a S R and a non-negative function a on R-(_ such that limr^oo ct(r) = 0, 

P{|/-a|>r} < a(r) 

for all r > 0. Then 

F{\f-Mf\>r + ro} < a(r), r > 0, 

where Mf is a P -median of f and where tq > is such that a(ro) < ^- If moreover a = 
a{r)dr < oo then f is integrable, \a — E/| < de, and for every r > 0, 

P{|/-E/| > r + a} < a{r). 

Thus, for a normal Levy family, deviation inequalities for the mean and median are equivalent 
up to the constants ci,C2. Theorem 1.7 in is a converse to ©, showing that if Lipschitz 
functions on a metric probability space p,P) are tightly concentrated about their means, 
this implies a rapid decay of ax,p,p{-)- 

4 Concentration via martingale differences 
4.1 Background 

Let (5",JF, P) be a probability space, where T is the usual Borel cr-algebra generated by the 
finite dimensional cylinders. On this space define the random process {Xi)i<i<m Xi e S. Let 
be the cr-algebra generated by {Xi . . .X^), which induces the filtration 

{0,5"}-J-oC.FiC...C.F„=.F. (9) 

For i = 1, . . . , n and / G Li(5", P), define the martingale difference 

V, - nf\H-n^\^^-l]■ m 

It is a classical result,^ going back to Azuma 1 , that 

P{|/-E/|>t} < 2exp(-iV2i^2) 
^See |14| for a modern presentation and a short proof of IIH . 
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where > ll^illL (^^"^ meaning of llV^ill^^ will be made explicit later). Thus, if we are 

able to uniformly bound the martingale difference, 

max \\V,\\^ < i7„, 

l<t<n 

we obtain the concentration inequality 

P{|/-E/|>t} < 2exp(^-^). (12) 

Our ability to derive results of the type in 1)12(1 will in general depend on the continuity properties 
of / and the mixing properties of the process X. 

Let us give two simple examples to build up some intuition. Let P be the uniform probability 
measure on {0, l}" and {Xi)i<i<n be the associated (independent) process. Though different 
notions of mixing exist 0, X trivially satisfies them all, being an i.i.d. process. Define / : 
{0, 1}" ^ [0, 1] by 

f{x) = Xi® X2® ■ ■ ■ ® Xn, 

where is addition mod 2. Since P{f{X) = 0} = P{f{X) = 1} = i, / is certainly not con- 
centrated about its mean (or any other constant). Though X is as well-behaved as can be, / is 
ill-behaved in the sense that flipping any single input bit causes the output to fluctuate by 1.^ 
For the second example, take / : {0, 1}" [0, 1] to be 

1 " 

1=1 

If {Xi)Ki<n is the i.i.d. process from the previous example, it is easy to show that the martingale 
difference in (fTUIl is bounded by 1/n, and so by l(T^ . / is concentrated about its mean. What 
if we relax the independence condition? The simplest kind of dependence in a random process 
is Markovian. Consider the homogeneous Markov process: P{Xi = 0} = P{Xi = 1} = ^ and 
Xi^i — Xi with probability 1. This process trivially fails to satisfy any (reasonable) definition of 
mixing [2]. Our well-behaved / is no longer concentrated, since we again have P{/(X) = 0} = 
P{/(X) = l} = i. 

The two examples above show that if we are to have any hope of obtaining inequalities such 
as (|12|l . we will need conditions of continuity and mixing on / and X, respectively. Much of the 
discussion in the remainder of this section builds upon the treatment in |11| for discrete spaces. 

4.2 Simple bound on the martingale difference 

Let (5", J^, P) be a probability space and (Xi)i<i<ri its associated random process; define the 
filtration {Ti\ as in @. At this point, we make the additional assumption that (IP{x) = 
p{x)diJ,"{x) for some positive Borel product measure /i" = /x®/i(8)...(g)/zon (5", JF), which we 
refer to as the carrying measure. In the cases of interest, S will be either countable or a compact 
subset of M, and correspondingly, will be the counting or Lebesgue measure. Similarly, the 
conditional probability P(- | Ti) ^ with density p{- \ XI = yl). Here and below p{x^ \ yl) 

will occasionally be used in place of p(a;" \ XI = y\); no ambiguity should arise. 
For / e Li(5",P), 1 < i < n and G S\ define 

nf^yi^^_EU{X) I XI ^ y\] E[/(X) | X^^ = y\-^]; (13) 

^Without making far-reaching claims, we comment on a possible connection between the oscillatory behavior of / 
and the notorious difficulty of learning noisy parity functions 0. By contrast, the problem of learning conjunctions 
and disjunctions under noise has been solved some time ago 
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this is just the martingale difference. A shghtly more tractable quantity turns out to be 

V.{f;yr\w.,w';) = E[f{X)\Xl^yl-'w.,]~E[f{X)\Xl^yl-'w'^, (14) 

where Wi^w[ G S. These two quantities have a simple relationship, which may be stated sym- 
bolically as ||Vi(/; •)||l^(p) < \\Vi{f] ■)\\l^cp) and is proved in the following lemma. 

Lemma 4.1. Suppose f £ P) and and yl € . Then for any e > there are Wi, € 5 



such that 



Proof. Let 



\V,{f;yl)\ < \V,if;yr\w,,w',)\+e 



(15) 



a = E[f{X) I XI = yl] = pix^^, | yl)f{ylx^+,)d„^'-\x^,); 

1 

v.{f;yl) = a- [ pixnyr')f{y^'x:)d^i"-^+\x^) 



piz\yl 



P{x7+i I yl'z)f{y^'zx^^)d^^'''{x^^) ] dti{z) 



where the last step invokes Fubini's theorem. We use the simple fact that for integrable g,h> 0, 
ini h{z) / g{z)dz < / g{z)h{z)dz < sup h{z) / g{z)dz, 



together with JgP{z \ y\ ^)dfj,{z) — 1, to deduce, for any £ > 0, the existence of a G 5 such 
that 



\V^{f;yl)\ < 



I yl-'w[)f{yl-' wlx^,)df,-\x^,) 



+ e 



for some G S. Taking Wi = yi, this proves the claim. 



□ 



4.3 Martingale difference as a linear functional 

The next step is to notice that Vi{-;yl~^ ,Wi,w^), as a functional on Li(5",P), is linear; in fact, 
it is given by 

V,if;yr\w,,w'^ = j^J{x)g{x)dti^{x) = (16) 

where 

The plan is to bound (/, g) using continuity properties of / and mixing properties of X^ which 
will immediately lead to a result of type H12|l via Lemma [4. II 



5 ?7-mixing 
5.1 Definition 

Let (5",.F, P) be a probability space and (Xi)i<i<„ its associated random process. In this 
section, we define a notion of mixing particularly suitable to our needs. For 1 < i < j < n and 

X G S\ let 

C{X-\X\^x) 
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be the law (distribution) of X" conditioned on XI = x. For y E ^ and w, w' £ S, define 

V^J{y,w,w') - \\C{X'';\Xl^yw)-C{X^'\Xl=yw')\\^^, (18) 
where is the total variation norm (see 50), and 

fjij = ess sup riij{y,w,w'), 

where the essential supremum is taken with respect to the measure P on 5*. Recall that if 
{U,hl,P), is a probability space and f : U ^ is measurable, esssup^gj/ /(x) is the smallest 
a € [0, oo] for which f < a holds P-almost surely. 

Let A„ be the upper-triangular n x n matrix defined by (A„)ii — 1 and 

(A„),, =77,,. (19) 

for 1 < i < j < n. Recall that the operator norm is given by 

||A„||oo = max (1 + r7i,i+i + . . . + r7i,„). (20) 

l<i<n 

A probability measure P on {S" , T) defines the function H-p : N ^ M by 

iJp(n) = ||A„||^; (21) 
we say that the process X (measure P) is rj-mixing if 

sup Hp{n) = Hp < oo. (22) 

n — ^oo 

As a trivial observation, note that if the variables {Xi) are mutually independent, we have 
{An)ij = and ||A„||_^ = 1. 

5.2 Connection to 0-mixing 

Samson using techniques quite different from those here, showed that if 5 = [0,1], and 
/ : [0, 1]" ^ R is convex with II/IIljp < 1 (in the £2 metric), then 

P{|/(X)-E/(X)|>t} < 2exp(--i^) (23) 

\ ^ 1 1 -I- n 1 1 2 / 

where ||r„||2 is the £2 operator norm of the matrix*^ 

(r„),, = y^(A„),,. (24) 

Following Bradley 1^, for the random process {Xi)i^z on (5^,^^, P), we define the 0-mixing 
coefficient 

m = sup0(.FiLoo,^j?fc), (25) 

where J^- C J- is the tr-algebra generated by the Xl^ , and for the cr-algebras A,B<zJ-, (j){A,B) 
is defined by 

(l){A,B) = sup{\P{B\A)^P{B)\:AeA, BeB,P{A)>0}. (26) 



^Samson used the stronger sup as opposed to ess sup in his analogue of rjij; we shall largely ignore this distinction 
in our analysis. 
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Samson observes that 

fj^J < 2(/.j„„ (27) 

which follows from 

\\C{X-\Xl = yl-'w)-£{X-\Xl = yl'w')\\^^ < \\C{Xf\Xl = yl^w) - C{X^;)\\^^ 

+ \\C{X-\Xl^yr'w')~C{X-)\\^^. 

This observation, together with (j^D)), implies a sufHcient condition for 77-mixing: 



^(/.fc < 00; (28) 



fc=i 

this certainly holds if {(pk) admits a geometric decay, as assumed in '21'. 

Although ?7-mixing seems to be a stronger condition than (/)-mixing (the latter only requires 
^fc — > 0), we are presently unable to obtain any nontrivial implications (or non-implications) 
between 77-mixing and either (/)-mixing or any of the other strong mixing conditions discussed 
in0. 

5.3 Comparison between ||r„||, and ||A„||^ 

The quantities ||r„(P)||2 and ||A„(P)||^ (written here with an explicit functional dependence 
on the measure P) are both numerical quantifiers of the mixing properties of P. Because of 
their role in the bounds H23|) and (|42|l . a smaller value for either quantity implies a tighter 
deviation bound. It turns out that neither is uniformly asymptotically tighter than the other; 
this statement is made precise in Theorem l5.3l We will first need an auxiliary lemma: 

Lemma 5.1. There exists a family of probability spaces (5" , J^", P„)„>i such that 

^,,(P„) = l/(n-z) (29) 

for 1 < i < j < n. 

Remark 5.2. Since different measures are being discussed, our notation will make explicit the 
functional dependence of fjij on the measure. 

Proof. Let S = {0, 1}. For 1 < A; < n, we will call x £ {0, 1}" a k-good sequence if Xk = Xn 
and a k-bad sequence otherwise. Define A^"^ C {0, 1}" to be the set of the fc-good sequences and 
Bit^ = {0, 1}" \ A^n^ to be the bad sequences; note that {A'n^l = \Bi,^^ \ = 2"-^. Let e1°^ be the 
uniform measure on {0, 1}": 

El")(x)=2-", xe{0,l}. 
Now take k = I and define, for some pk G [0, 1/2], 

Fi^\x) = afcri'"'n^)(pa|^^^(.,|+(l-Pfe)l|^^5<.,|), (30) 

where ak is the normalizing constant, chosen so that J2x£{o 1}" '^\^) = 1- 

We will say that a probability measure P on {0, 1}" is k-row homogeneous if for all 1 < ^ < 
we have 

(a) ht{F) = fje,e+iiP) = m,e+2(F) = ...= ?7f,„(P) 

(b) ?7y (P) = for fc < i < j 
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(c) hk is a continuous function of pk E [0, 1/2], with hk{0) = 1 and hk{l/2) = 0. 

It is straightforward to verify that F^^-* , as constructed in (|30|l . is 1-row homogeneous.^ Therefore, 
we may choose pi in H30() so that hi = l/{n — 1). Iterating the formula in (|30|l we obtain the 

sequence of measures |r^'''' : 1 < /c < n|; each r^*^' is easily seen to be fc-row homogeneous. 

Another easily verified observation is that hiilii^) — /i£(F^'^^^-') for all 1 < fc < n — 1 and 
1 < ^ < k. This means that we can choose the {pk} so that h^i^n^) = l/(n — fc) for each 
1 < fc < n. The measure P,i = F^'' has the desired property (|29|l . □ 

Theorem 5.3. There exist families oj probability spaces (5", J^", P„)„>i such that _R„ ^ and 
also such that i?„ —f oo, where 

p . l|r„(P„)|l2 

nn — 



|A„(P„)|| 



Proof. Recall that for an n x n real matrix A, its too operator norm is given by (|2()|l and its (.2 
operator norm is given by 

\\A\\,= sup = JX^UA^A) 

where Amax is the spectral radius. We use the standard asymptotic "big O" notation, where 
if f,g : N R+, we say / — 0{g) if Yiuisn^^^^^ f{n)/g{n) < 00. The preceding relationship 
between / and g may also be expressed as g = If both / = 0{g) and / = VL{g) hold, we 

write f = e{g). 

For the first direction, let S — {0, 1} and let P„ be the measure constructed in Lemma [5. II 
satisfying (|29(l . For this measure, we have ||A„(P„)||^ — 2 for all n G N, so we proceed to 
lower-bound ||r„(P„)||2. Letting G„ = r„(P)^r„(P), an easy calculation (using and 
gives 

min(ij)-l 



k=l 



(here, 0/0 = 0). Taking x E M" with Xi = i for 1 < i < n and noting that 



^ ijmm{i,j)^Q{n^), 

l<z,j<n 



we conclude that x^GnX — ft{n^). Now 



^ n \ 1/2 



so 



|r„(P)|l2 > 



^/ x^G„x 



\\x\\2 e(n3/2) 
r!(ni/2) 



and Rn = r2(ni/2). 



^The continuity of hk follows from Lemma 6.1 in 
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For the other direction, let S — {0, 1} and call x" € 5" a forbidden sequence if xi ^ Xn 
and an allowed sequence otherwise. Define the measure P„ on iS" as vanishing on the forbidden 
sequences and equiprobable on the allowed sequences: 

P^{x^) = 2-"+il|,^^,„}. (31) 

For this measure, it is easy to see that 

Vij = l<i<j<n. 

This forces |1A„(P„)|1^ = n and 

{Gn)ij = + 1, 

where, as before, G„ = r„(P„)^r„(P„). To upper-bound Ainax(G„), we use a consequence of the 
Gersgorin disc theorem ([7], 6.1.5) - namely, that 

n 

Amax(G„) < max V'(G„)y = n+ 1. 

l<i<n ^ — ' 

" " J = l 

This implies i?„ = 0{n-^/'^). □ 

Remark 5.4. The last example in the proof illustrates the simple but important point that the 
choice of enumeration of the random variables {Xi} makes a difference. Let tt be the permutation 
on {1, . . .,n} that exchanges 2 and n, leaving the other elements fixed. Let (^i)i<i<n be the 
random process on {0, 1}" defined in 13111 and define process Y = tt{X) by Yi — X^rji), 1 < i < n. 
It is easily verified that ||A„(y)||j^ = 2 while we saw above that ||A„(X)||j^ = n. Thus if 
/ : {0, 1}" ^ M is invariant under permutations and ^1,^2 G K are random variables defined by 
^1 = f{X), ^2 = f{'^{X)), we have ^1 = ^2 with probability 1, yet our technique proves much 
tighter concentration for ^2 than for ^1 . Of course, knowing this special relationship between ^1 
and ^2, we can deduce a corresponding concentration result for ^1; what is crucial is that the 
concentration for ^1 is obtained by re- indexing the random variables. 

Remark 5.5. Note that for the first direction in the proof of Theorem 15.31 we constructed a 
sequence of measures P„ such that ||A„(P„)||^ = 2 is bounded while ||r„(P„)||2 = n{n^^^). Is 
there a sequence of measures for which ||r„(P„)||2 is bounded and ||A„(P„)||^ unbounded? We 
conjecture that such a sequence of measures indeed exists, but leave its construction for future 
investigation. 

Remark 5.6. In Lemma |5.1l we constructed a sequence of measures P„ so that A„(P„) has a 
specific form. An obvious constraint on the form of A„ is 

and the constraint 

(**) > for 1 < i < j < n 

is easily seen to hold for all measures P„ on 5". Do (*) and (**) completely specify the set of 
the possible A„(P„) - or are there other constraints that all such matrices must satisfy? We are 
inclined to conjecture the former, but leave this question open for now. 

6 ^'-dominance 

Having dealt with the "analytic" mixing condition on in SJS] we now turn to the geometry of 

(5",Pn). 

We say that the family of metric measure spaces (5", /i")„>i is consistent if 
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(i) the metrics {pn} satisfy, for all 1 < z < n and x^, G 5^, 

/ n n\ /'i— In i—ln\ 

whenever Xi = yi 

ill) for each n > 1, ^" is a positive product measure on the Borel a-algebra induced by p„. 

Remark 6.1. Condition (i) imphes that the topology r" induced by pn on 5" is the product 
topology r" = r (g) r (g) . . . (g) T, where t is the topology induced on S hy pi. Likewise, ^ is a 
positive measure on the Borel cr-algebra generated by {S, pi) and /i" — p ® p ® . . . ® p, is the 
corresponding product measure on the product a-algebra. 

A quantitative notion of continuity is the Lipschitz condition, which is defined with respect 
to some metric p„ on 5". Define Lip(5",p„) to be the set of all / : 5" — > [0, dianip^ (5")] such 
that 

sup m^I^ < 1 (32) 

x^yes^- Pn(x,y) 

(any such function is continuous and therefore measurable). 

Remark 6.2. Since the Lipschitz condition implies diam/(5") < diamiS" and the functionals 
Vi and Vi (defined in p3(l and H14|l . respectively) are translation- invariant (in the sense that 
^iifi y) = ^j(/ + y) a G R), there is no loss of generality in restricting the range of / 

to [0,diam5"]. 

Let Fn = ii(5", /i") and equip Fn with the inner product 

(/,g) = / f{x)g{x)dp-{x). (33) 

Since f,g & Fn might not be in L2{S^, p"), the expression in (|33|l in general might not be finite. 
However, for g G Lip(5", p„), wc have 

\{f,g)\ < diamp„(5")||/||^^(^„) (34) 



(the motivation for bounding (/, g) comes from (iTr 

Define the marginal projection operator tt : f „ — > Fn-i as follows. If / : 5" — > R then 
(tt/) : S"~^ — > R is given by 

{nf){x2, ■ ■ ■ ,Xn) ^ J f{xi,X2, ■ ■ ■ ,Xn)dp{xi). (35) 

Note that by Fubini's theorem (Thm. 8.8(c) in '50'), tt/ e Li(5""\ ^"~^). Define the functional 
'■ Fn —>-R recursively: ^'o = and 



= / {f{x))+dp"{x) + ^n-i{^f) (36) 
for n > 1. The latter is finite since 

*„(/) < (37) 

as shown in Theorem lA. II below. 

We say that the family of metric spaces (iS", pn)n>i is ^-dominated with respect to a positive 
Borel measure p on S it (5", p„, ^")„>i is consistent in the sense of (i) and (ii) above, and the 
inequality 

sup < *„(/) (38) 

g6Lip(S",p„) 

holds for all / e ii(5",Af"). 
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Theorem 6.3. Suppose (5",p„)„>i is a "^-dominated family of metric spaces with respect to 
some (positive Borel) measure fi and (5",r„)„>i is another family of metric spaces, with Tn 
dominated by pn, in the sense that 

Tn{x,y) < Pn{x,y), x,yeS"- (39) 

for all n>l. Then (5",t„)„>i is also '^-dominated with respect to p,. 

Proof. By we have 

Lip(5",T„) c Lip(5",p„), 

which in turn imphes 



sup 

seLip(5" 



< 



sup 

gGLip(S" 



I (/,<?) I <*„(/). 



□ 



We are about to define two functionals on F„ = Li(5",/i"). Ahhough we use the norm 
notation, none of the results we prove actuaUy rely on the norm properties of ||-||^ and ||-||^, 
and so we defer a discussion of these do the Appendix. The punchline is that under appropriate 
conditions both are valid norms; ||-||^ is (topologically) equivalent to ||-||^ while ||-||^ is in 
general weaker. 

The two norms are defined as 

ll/IU = sup (40) 

geLip(5",p„) 

and 

ll/IU = maxvl/„(s/); (41) 

s— ±1 

note that (|38|l is equivalent to the condition that ||/||^ < ||/||^ for all / e F„. We refer to the 
norms in (|40|l and (|41|l as $-norm and \l/-norm, respectively; notice that both depend on the 
measure /i and $-norm also depends on the metric. 



7 Main result: 77- mixing with dominance imply normal 
Levy family 

Theorem 7.1. Let {S'^ , pk,P)i<k<n be a dominated family of metric probability spaces with 
respect to a positive Borel measure jj,, where P ^ /x". Then, for any Lipschitz (with respect to 
Pn) f '■ 5" — !■ M we have 

P{|/-E/|>t} < 2exp( ^ ^1 

for all t > 0, where A„ is defined in Hiyj) . 

Remark 7.2. A version of this result is proved in Theorem 5.1 of |11| . for the special case of the 
counting measure on a finite set 5", where p is the Hamming metric. Note that if we require 
II /II Lip ^ 1 with respect to the normalized metric p„ = -^Pn, we get 

P{|/-E/|>t} < 2exp(^-^^^^; (42) 
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for ?7-mixing measures P (see (|22ll '). this implies P{|/ — E/| > t} < 2 exp(— ni^/2i/p), meaning 
that the (5", p„, P) form a normal Levy family. 

We will use the same conventions regarding the density dP{x) = p{x)dfi"{x) as in M4.2I 

Proof. The claim will follow via H12|) . by proving the bound 

r.(/;-)IU^(p) < ||/ILiJ|A„||oo (43) 

on the martingale difference Vi{f; •). Since ||V^i(/; and H/Hlip arc both homogeneous func- 

tional of / (in the sense of T{af) — \a\T{f) for a E M), there is no loss of generality in taking 

ll/ILip = 1- 

Lemma [4.11 shows that it suffices to bound ||Vi(/; OIkoo' and from (|15|l . we have 

nf;yl-\w,,w':) = / f{x)g{x)d^l"{x) = (44) 

where 

Let 1 < i < j < n and y e S^~^,w,w' e 5 be fixed. For 1 < fc < n, let ^Fit = Li{S^ , ^i^) and 
recall the definition H35|) of the projection operator tt : Fk ^ Ffc-i- Put N = n — i + 1 and for 
?/ e 5*^^ define the operator Ty : Fn ^ Fn by 

(T,/)(x) = /(yx) 

for each x G 5^. Observe that H45|l implies 

if, 9) = {Tyf,Tyg). (46) 

By Remark |6. 21 we may take / g Lip(5",p„), and therefore (by the consistency of the metrics, 
in the sense of ^HJi Tyf S Lip(5^,pAr). 

Let g(^) ^ Tyfj and for f = iV, iV - 1, . . . , 2, define 

note that .g(^) G F^. 

A direct calculation (using the Radon-Nikodym theorem) gives 

(x) = p(X; = X I = j/u;) - p{Xf ^x\Xl= yw') 

for all X e S"-^+\ It follows via ® that 

Since the measure diy = g^'^-'+^\x)d^i^-'+^{x) is the difference of two probability measures, we 
have lli^ljrpy < 1. Thus the definition of the "^n-i+i functional (acting on F„_i+i) yields 

n 

\\Tyg\U < 1+ I] n,j{y,w,w') 
j=i+i 

n 

< 1 + fjij P-almost surely 

j=i+i 

< l|A„||^. 

Putting together and gSJ, we obtain the desired bound in 1021). □ 



14 



8 Applications 



8.1 (N'',(iHam) is ^-dominated 

A core result in 11 (Theorem 4.8) effectively established the ^-dominance of (5",dHam) for 
finite S. For the countable case, verifying consistency (in the sense of is trivial. Let S — N, /j. 
be the counting measure on 5", / € ^i(5") = ii(5", fi) and g £ Lip(5", c?Ham)- For m > 1, let 
Srn = {k Cz S : k < m} and define the m- truncation of / to be the following function in £i(5"): 

frn{x) = l{^g5n^}/(x). 

Then we have, by ^J, Theorem 4.8, 

(/m,5> < *«(/m) 

for all m > 1, and limm_,oo fmix) = /(a^) for all x G 5". Let hm{x) = frn{x)g{x) and note that 
|'im(a;)| < n\f{x)\, the latter in £i(iS"). Thus by Lebesgue's Dominated Convergence theorem, we 
have {fm,g) — if id)- A similar dominated convergence argument shows that ^'„(/m) — *■ '!'„(/), 
which proves the ^'-dominance of (N", o?Ham)- 

8.2 ([0, 1]", IHIJ is Mominated 

Since verifying consistency is trivial, it remains to prove 

Theorem 8.1. Let fi be the Lebesgue measure on [0, 1] and pn{x,y) = \\x — y\\j^, for x,y £ [0,1]". 
Then we have 

Wfh < ll/IU (47) 

for an/eLi([0,l]",M")- 

Proof. Let F„ ~ Li([0, 1]", /i") and C„ C F„ be the class of continuous functions. It follows 
from Theorem 3.14 of jJOj that C„ is dense in F„, in the topology induced by IHI^^. This implies 
that for any / S F„ and £ > 0, there is a g £ €„ such that \\f — g\\^ < ejn and therefore (via 
(El and 1^), 

ll/-5lU<e and ||/-5||^<e, 

so it suffices to prove (IT7|) for f £ Cn- 

For m > 1, let 5m = {fc e N : < fc < to}. Define the grid map 7,„ : [0, 1]" by 

[7m(a;)]i = max{fc e 5„i : < a;i} 

for a; £ [0, 1]" and 1 < i < n; cc is called an m-grid point if each coordinate Xi is of the form 
Xi = s/m, for some s £ Sm- We say that 5 € f„ is a grid-constant function if there is an to > 1 
such that g{x) = g{y) whenever ^rn{x) = lm{y)] thus a grid-constant function is constant on the 
grid cells. Let Gn C Fn be the class of grid-constant functions. It is easy to see that G„ is dense 
in Cn- Indeed, for f £ Cn and e > 0, there is a ^ > such that L0f{5) < e, where w/ is the £00 
modulus of continuity of /. Taking to = \l/S~\ and g £ Gn to be such that it agrees with / on 
the TO-grid points, we have |j/ — .9||ij([o 1]") — 11/ ~ 5IIl3o([o 1]") ^- Thus we need only prove 
diZI) for / G G„. 

Define the metric dm on Sm- 



\z ■ 

dm{z, z') = 



and extend it to S" 



dm{z,z') = y^^dm{Zi,zl) 

1 
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Let Dn C Gn consist of those functions 5 : [0, 1]" — > [0, n] for which there is an to > 1 such that 

15(2^) - g{y)\ < c?m(7m(a;),7m(y)) 

for aU x,y £ [0, 1]". The argument used above shows that _D„ is dense in Lip([0, 1]", and 
so it suffices to bound sup^g^^ if^g) for / e Gn- 

Fix / S Gn, g G Dn, and let to > 1 be such that / and g are m-grid-constant functions. Let 
K, (p : ^ R be such that K(7m(a;)) = f{x) and (^(7^(3;)) = g{x) for all x £ [0, 1]". Then 



TO 



2 £5". 



and 



*n(/) = (-) 



m 



where \l/„ is computed using the counting measure on 

Define Lip(5^, dm) and Lip(5^, c?Ham) in accordance with and note that (p £ Lip(5^, d,„). 
We claim that Lip(5m,c?„i) C Lip(5^, dnam); this holds because d,n{z,z') < dji^^{z,z'). Theo- 
rem 4.8 in states that for all k : R, 

sup ^ K{z)(p{z) < *„(k). 

This implies (/, 5) < 5'n(/) and completes the proof. □ 

Remark 8.2. One might be tempted to take a shortcut to this result by showing directly that 
([0, 1]", dnam) is ^'-dominated and then applying Theorem 16.31 to dnam and ||-||-^. The problem 
with this approach is that dnam induces the discrete topology on [0, 1]", whose open sets are not 
necessarily Lebesgue measurable. 

8.3 ([0, 1]", II -lip) is Mominated 

Recall that for any 1 < p < 00 and any x £ R" , we have 

\\x\\^ < \\x\\, < n'/'^'llxl, (48) 

where l/p+ 1/p' — 1. The first inequality holds because the convex function x > ||2:||p is 
maximized on the extreme points (corners) of the convex polytope {x £ R" : ||a;||j^ = 1}. The 
second inequality is checked by applying Holder's inequality to ^Xiyi, with y = \. Both are 
tight. Furthermore, all the £p norms induce the same topology on R", whose Borel sets are 
Lebesgue measurable. Thus, in light of Theorem 16.31 the ^'-dominance (with respect to the 
Lebesgue measure, see Theorem 18. l|l of ||-||-^ implies the ^P-dominance of ||-||p. 

8.4 Converting between Samson's bound and Theorem 17.11 

Let us attempt a rough comparison between the results obtained here and the main result of 
Samson's 2000 paper [2]- In light of Theorem 15.31 a uniform comparison between our mixing 
coefficient ||A„||_^ and Samson's ||r„||2 is not possible. However, assume for simplicity that for 
a given random process X on [0, 1]", the two quantities are of the same order of magnitude. For 
example, for the case of contracting Markov chains with Doeblin coefficient 61 < 1, we have 

IIA„lL < lir„||2 < ^ 



1 - 01/2 
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(as computed in and respectively). 

Throughout this discussion, we will take S — [0, 1] and /i to be the Lebesgue measure. For 
/ : M" M, we define ||/||Lipp to be the (smallest) Lipschitz constant of / with respect to the 
metric d{x, y) = \\x — where 1 < p < oo. 

Suppose / : [0, 1]" ^ M has H/Hlip 2 ^ 1- Samson gives the deviation inequality 

P{|/-E/|>t} < 2exp( ^1 

with the additional requirement that / be convex. By H48|l we have II/IIljp ^ < 1 and by Theo- 
rem 18.11 the £1 metric is ^P-dominated. Thus, Theorem 17.11 applies: 

P{|/-E/| >iV^} < 2exp(- ) (49) 

for any / : [0, 1]" R with ||/||Lip^2 — 1 (convexity is not required). 

To convert from the bound in Theorem 17.11 to Samson's bound, we start with a convex 
/ : [0,1]" R, having ||/||Lipi < 1. By igHl, this means that ||/||Lip2 ^ V^i or equivalently, 
2 1^ 1- Applying Samson's bound to n^^^'^f, we get 

P{|/-E/|>^^^} < 2exp(--^), (50) 

\ ^ 1 1 1 n 1 1 2 / 

while the bound provided by Theorem 17.11 remains as stated in 149|) . 

We stress that the factor of -/n in (|49l) and H50() appears in the two bounds for rather different 
reasons. In (I49|) . it is simply another way of stating Theorems 17.11 and 18.11 for ||/||Lip]^ < 1; 

namely, P{|/- E/| > t} < 2 exp(-t2/2n || A„||^). In the was the "conversion cost" 

between the and the £2 metrics. 



9 Discussion 

We have provided a general framework for proving measure concentration results in various 
metric spaces. A useful feature of our treatment is its modularity: since the geometric properties 
of the metric (^'-dominance) have been decoupled from the analytic properties of the measure 
(77- mixing). Theorem 17.11 actuallv gives rise to a family of measure concentration results. 

While the bounds stated in terms of A„ are not directly comparable to the ones in terms 
of r„, we provide some discussion and intuition in ^5.31 and ti8.4l The rough summary is that 
neither gives asymptotically tighter bounds than the other uniformly over all processes, and that 
the former is most suitable for the ti metric while the latter works best with £2 (though both 
are applicable to general Ip metrics; see ^8.31 and ti8.4|l . Samson's deviation inequality requires 
that / be convex while ours does not; we also note that the l^o operator norm ||A„||^ is often 
simpler to estimate than the spectral norm ||r„||2. 

Comparisons aside, we have offered a new approach for studying the concentration of measure 
phenomenon and are hopeful that it will find interesting applications in future work. 



A Norm properties of ||-||^ and ||-||^ 

It was proved in 11 that ||-||^ and ||-||^ are valid norms when S is finite. We now do this in 
a significantly more general setting, and examine the strength of the toplogies induced by these 
norms. 
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Theorem A.l. Let Fn = Li(5",^") for some positive Borel measure fi. Then 
(a) ll'll^ is a vector-space norm on Fn 
(h) for all feFn, 

< 11/11* < n\\f\\^^. 

Proof. We prove (b) first. Since 

we have that ||/||* (defined in and H41|l l is the sum of n terms, each one at most ||/||^^ and 
the first one at least 5 H/H^ ; this proves (b). 
To prove (a) we check the norm axioms: 

Positivity: It is obvious that ||/|1* > and (b) shows that ||/||* = and iff / = a.e. [fi]. 

Homogeneity: It is immediate from H36|l that 5'„(a/) = a'^nif) for a > 0. From H41|l we 
have 11/11^ = Together these imply ||a/||^ = \a\ ||/||^. 

Subadditivity: It follows from the subadditivity of the function h(z) = (z)^ and additivity of 
integration that ||/ + g\\^ < \\f\\^ + \\g\\^. □ 

Theorem A. 2. Let F„ — Li(iS",^) for some metric measure space (iS",p, /i"). Then ||-||^ is a 
seminorm on F„ . 

Proof. Nonnegativity: \\f\\^ > is obvious from the definition H40I) . 

Homogeneity: It is clear from the definition that ||a/||$ = \a\ ||/||$ for any a G K. 

Subadditivity: \\f + g\\^ < 11/11$ + |1.9|1$ follows from the linearity of (•, •) and the triangle 
inequality for |-|. □ 

Under mild conditions on the metric measure space (iS",p, /x"), ||-||$ is a genuine norm. We 
will use the topological notion of local compactness (meaning that every point has a neighborhood 
with compact closure). We also require some regularity conditions on the measure 

(a) fJ,"'{K) < 00 for every compact set K C S"' 

(b) for every Borel £' C 5", we have 

^"(S) = inf {^"(y) :EcV, V open} 

(c) a E C 5" is either open or satisfies /i"(i?) < 00 (or both) we have 

^"(£;) sup {/i" (is:) :K <ZE, K compact} . 

These conditions are rather weak (for example, they are weaker than inner- and outer-regularity), 
and are satisfied by most interesting measures, including the counting measure on countable sets 
and the Lebesgue measure on M" (see Theorem 2.14). 

We say that a real-valued function / defined on a metric space (A", p) is locally Lipschitz if 
for each x G X there is an open x & U C X and a < C{x) < 00 such that 

sup < cix). 

veu\{x} P(x,yj 

Theorem A. 3. Let fi he a measure on a locally compact metric space (X,p), where pL satisfies 
the regularity conditions (a)-(c) above. Then for any f G Li{X,pL), ||/||$ = iff f = a.e. [p]. 
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Proof. Suppose / S Li{X,iJ,). The case / < a.e. [fi] is trivial, so we assume the existence of a 
Borel E C X such that 

< fj.{E) < oo, / > on 

Let g{x) = Ija-g^;} be the characteristic function of E and note that g € Li{X,fi). 

Theorems 2.24 and 3.14 in (20; (the first is Lusin's theorem) provide a sequence of continuous 
functions /i„ such that 

sup \hn{x)\ < sup \g{x)\ = 1, \\g - KW^ 0, 

xex xex 

which imphes hn g a.e. [/i]. Thus by Lebesgue's Dominated Convergence theorem, we have 

if-hn) ^ {f,9) = I fdii>Q. (51) 

J E 

At this point we will need two facts: 

1. continuous functions can be uniformly approximated by locally Lipschitz functions 

2. locally Lipschitz functions can be uniformly approximated by finite linear combinations of 
members of Lip(A', p) (defined in H32|l : 

both are straightforward to verify. It follows from (|51|l that the linear functional (/, •) cannot 
vanish on all of Lip(A', p), which implies ||/||^ > 0. □ 

Theorem I A . 1 1 shows that ||-||^ is topologically equivalent to ||-||^ . The norm strength of ||-|j^ 
is a more interesting matter. In the case of finite S, En = £i(iS") is a finite-dimensional space 
so all norms on En are trivially equivalent. Suppose 5 is a countable set (equipped with the 
counting measure) and p has the property that 

d = inf p{x, y) > 0. 

The functions g{x) — dl{y(2.)>o} ^-nd h{x) — dl^f(^x)<o} both in Lip(iS, p), and since d \\f\\i — 
l(/,5>l + l(/,/i)l, wehave 

< 11/11* < diamp(5)||/||i (52) 

for all f Cz En, so the norms and ||-||-^ are equivalent in this case. 

Suppose, on the other hand, that T = {xi, X2, ■ ■ ■} forms a Cauchy sequence in the countable 
space 5, with Si — p{xi,Xi+i) approaching zero. Let / G iiiS) be such that f{x2k) — ~^f{x2k-i) 
for fc = 1, 2, . . . and f{x) = ioi x ^ T; then 

oo oo 

11/11$ < J2\-^{x2k^l)\S2k-l < ||/|li5]<52.-l. (53) 

k=l fe=l 

If 5 = Q n [0, 1] (the rationals in [0, 1]) with p{x,y) = \x — y\ as the metric on S, the r.h.s. of 
H53|l can be made arbitrarily small, so for this metric space, 

inf{||/||<,:||/|li = l} = 

and ll'll^ is a strictly weaker norm than 

Similarly, when 5 is a continuous set, ||-||^ will be strictly weaker than ||-||^^ in a fairly 
general setting. As an example, take n = 1, S ~ [0,1], p the Lebesgue measure on [0, 1], and 
p{x, y) = \x-y\. For N eN, define ■ [0, 1] ^ N by 

7Ar(x) = max{0 < k < N : k/N < x} . 
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Consider the function 



/w(a;) = (-l)^-(-), 

for N = 2,4,6, .. note that / is measurable and ||/|| = 1- 
For a fixed even N, define the fcth segment 

Ik = {xe [0, 1] : A: < -fN{x) < fc + 2} 



k k + 2 



_iV' N 

for k = 0,2, . . . ,N — 2. Since diam/fc = 2/N, for any g G Lip(<S, p), we have 

supg'(a;) — 'm{g{x) < 2/N; 
Ik 

this implies 

/ fN{x)gix)d^i{x)<2/N^ 
J Ik 

Now [0, 1] is a union of N/2 such segments, so 

/ fN{x)g{x)dn{x)<l/N. 
Jo 

This means that ||/||$ can be made arbitrarily small while = 1, so once again and ||-||^ is 

a strictly weaker norm than 
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